10. TD Control: Sarsamax

TD Control: Sarsamax

Check out this (optional) research paper to read the proof that Sarsamax (or Q-learning) converges.